Classdesc and Graphcode: support for scientific 
programming in C++ 



Russell K. Standish 
School of Mathematics 
University of New South Wales 
R.Standish@unsw.edu.au 



http: / / parallel.hpc.unsw.edu.au/rks 



Duraid Madina 
Department of Systems Studies 
University of Tokyo 
duraid@sacral.cu-tokyo.ac.jp 

October 23, 2008 



Abstract 

Object-oriented programming languages such as Java and Objective 
C have become popular for implementing agent-based and other object- 
based simulations since objects in those languages can reflect (i.e. make 
runtime queries of an object's structure). This allows, for example, a 
fairly trivial serialisation routine (conversion of an object into a binary 
representation that can be stored or passed over a network) to be written. 
However C++ does not offer this ability, as type information is thrown 
away at compile time. Yet C++ is often a preferred development environ- 
ment, whether for performance reasons or for its expressive features such 
as operator overloading. 

In scientific coding, changes to a model's code takes place constantly, 
as the model is refined, and different phenomena are studied. Yet tra- 
ditionally, facilities such as checkpointing, routines for initialising model 
parameters and analysis of model output depend on the underlying model 
remaining static, otherwise each time a model is modified, a whole slew of 
supporting routines needs to be changed to reflect the new data structures. 
Reflection offers the advantage of the simulation framework adapting to 
the underlying model without programmer intervention, reducing the ef- 
fort of modifying the model. 

In this paper, we present the Classdesc system which brings many of 
the benefits of object reflection to CH — h, ClassdescMP which dramatically 
simplifies coding of MPI based parallel programs and Graphcode a general 
purpose data parallel programming environment. 
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1 Introduction 



This paper describes Classdesc, ClassdescMP and Graphcode, techniques for 
building high performance scientific codes in CH — h 

Classdesc is a technique for providing automated reflection capabilities in 
C++, including serialisation support. ClassdescMP builds on Classdesc's se- 
rialisation capability to provide a simple interface to using the MPI message 
passing library with objects. Graphcode implements distributed objects on a 
graph, where the objects represent computation, and the links between ob- 
jects represent communication patterns. It is a higher level of abstraction than 
the message passing paradigm of ClassdescMP, yet more general and powerful 
than traditional data parallel programming paradigms such as High Performance 
Fortran 2 or POOMA[TJ. 

This paper is organised into three sections, describing the three technologies 
in more detail. The final section concludes with a description of the current 
status of the code, and where it can be obtained from. 

2 Classdesc 

2.1 Reflection and Serialisation 

Object reflection allows straightforward implementation of serialisation (i.e. the 
creation of binary data representing objects that can be stored and later recon- 
structed), binding of scripting languages or GUI objects to 'worker' objects and 
remote method invocation. Serialisation, for example, requires knowledge of 
the detailed structure of the object. The member objects may be able to be 
serialised (e.g. a dynamic array structure), but be implemented in terms of a 
pointer to a heap object. Also, one may be interested in serialising the object in 
a machine independent way, which requires knowledge of whether a particular 
bitfield is an integer or floating point variable. 

Languages such as Objective C give objects reflection by creating class ob- 
jects and implicitly including an isa pointer in objects of that class pointing to 
the class object. Java does much the same thing, providing all objects with the 
native (i.e. non-Java) method getClassO which returns the object's class at 
runtime, as maintained by the virtual machine. 

When using C++, on the other hand, at compile time most of the informa- 
tion about what exactly objects are is discarded. Standard C++ does provide 
a run-time type information mechanism (RTTI), however this is only required 
to return a unique signature for each type used in the program. Not only is this 
signature compiler dependent, it could be implemented by the compiler enumer- 
ating all types used in a particular compilation, and so the signature for a given 
type would differ from program to program! Importantly, standard RTTI does 
not provide any information on the internal structure of a type, nor methods 
implemented. 

The solution to this problem lies (as it must) outside the C++ language per 
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se, in the form of a separate program which parses the interface header files. A 
number of C++ reflection systems do this: SWIG[4], being perhaps the oldest, 
parses a somewhat simplified C++ syntax with markup, provides exposure to 
scripting language of selected top level objects. Reflex[T2], a more recent system 
than Classdesc [5], is interesting in that it use the GCC-XML parser (based on 
the C++ front end of GCC) to parse the input file to build a dictionary of type 
properties. Classdesc differs from these other attempts by traversing the data 
structures recursively at runtime, providing a genuine solution to serialisation, as 
well as allowing "drill down" of simulation objects in an interactive exploration 
of a running simulation. 

These are generically termed object descriptors. The object descriptor gener- 
ator only needs to handle class, struct and union definitions. Anonymous structs 
used in typedefs are parsed as well. What is emitted in the object descriptor is 
a sequence of function calls for each base class and member, similar in nature to 
compiler generated constructors and destructors. Function overloading ensures 
that the correct sequence of actions is generated at compile time. 

For instance, assume that your program had the following class definition: 

class jellyfish: public animal 
{ 

double position [D], velocity [D], radius; 
int colour; 

>: 

and you wished to generate a serialisation operator called pack. Then this 
program will emit the following function declaration for jellyfish: 

#include "pack_base .h" 

void pack(pack_t *p, string nm, jellyfishfe v) 
{ 

pack(p,nm, (animalfe)v) ; 

pack(p,nm+" .position" , v. position, is_array() , 1 ,D) ; 
pack(p,nm+" . velocity" , v. velocity , is_array () , 1 ,D) ; 
pack (p , nm+ " . radius " , v . radius ) ; 
pack(p,nm+" . colour" , v . colour) ; 

> 

The use of auxiliary types like is_array () improves resolution of overloaded 
functions, without polluting global namespace further. This function is over- 
loaded for arbitrary types, but is more than a template, so deserves a distinct 
name. We call these functions class descriptors (hence the name Classdesc) , or 
simply an action for short. 

Thus, calling pack(p, "" ,var) where var is of type testl, will recursively 
descend the compound structure of the class type, until it reaches primitive 
data types which can be handled by the following generic template defined for 
primitive data types: 
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template <class T> 

void pack_basic (pack_t *p, string desc, T& arg) 
{p->append( (criar*)&arg, sizeof (arg) ) ; } 

given a utility routine pack_t : : append that adds a chunk of data to a repository 
of type pack_t. 

This can even be given an easier interface by defining the member template: 

template <class T> 

pack_t& pack_t : : operator<< (T& x) 

{: :pack(this, "" ,x) ;} 

so constructions like buf << foo << bla; will pack the objects foo and bla 
into the object buf. 

Classdesc is released as public domain software, and is available from the 
IClassdesc websitJ l. The pack and unpack operations work more or less as de- 
scribed. The type xdr_pack, derived from pack_t uses the standard unix XDR 
library to pack the buffer in a machine independent way. This allows checkpoint 
files to be transported between machines of different architectures, or to run the 
simulation in a client-server mode, with the client downloading a copy of the 
simulation whilst the simulation is in progress. 

2.2 Object Exposure 

Another application of reflection is exposing object internals to an external envi- 
ronment, such as a scripting language, or another object oriented programming 
language. For computational science models, adding a scripting language has 
many advantages [TT]. Initialisation of the model is simply achieved by setting 
a few variables within a script. Data collected can be customised without code 
recompilation by simple script changes. GUI widgets can allow the real time 
monitoring of the model's variables in a graphical form during debugging and 
model development. A drill down facility can be readily provided in the script- 
ing language that allows the model to be stopped, and values of the model's 
variables queried. Being a scripted environment, the same executable can be 
used for exploration in a GUI mode, or for production in a batch mode, simply 
by using a different script. 

Automated techniques for exposing objects to a scripting environment, or to 
a different 00 environment already exist. Examples include VTK[16 , CORBA/IDL 
and SWIG[4J. However these either straight -jacket the programmer into using 
a particular programming style, or require the class definitions to be coded in a 
different language, (often termed an IDL). SWIG at least has the advantage of 
being able to parse any ISO standard C/C++ code. Its strong advantage is that 
it already has bindings for many popular languages, including TCL, Python, 
Perl and Java. Where Classdesc and SWIG might work well together is script- 
ing Fortran applications. An experimental Fortran version of Classdesc (called 

1 http: / / ecolab.sourceforge.net / classdesc.html 
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FClassdesc) was developed under a grant from the Australian Partnership for 
Advance Computing, with serialisation of Fortran modules and a descriptor to 
produce C-syntax code for use as input to SWIG. 

The E c tab agent based modelling system uses Classdesc to expose C++ 
objects to the TCL scripting language [TU]. If the authors had been aware of 
SWIG at the time of ^ c $afa's development, SWIG would probably have been 
used instead, however, Classdesc's recursive approach to analysing data struc- 
tures is more useful for interactive exploration of scientific models in a simu- 
lation framework than SWIG's approach of requiring explicit exposure of top 
level objects only. 

Virtually any model that is implemented as a CH — h object can be dropped 
into Ectab, and one instantly has a scriptable simulation system, with GUI 
plotting and drill down tools and checkpointing functionality. The main pro- 
gramming constraint is the DCAS requirement ( §2.4|) . although departures from 
DCAS tend to result in degraded capability rather than catastrophic failure 
(such as uncompilable code). 

The exposure of objects into TCL is handled by a descriptor TCL_obj. 
Simple data members generate a TCL command which returns the value of 
that member, and set the member if an argument is supplied (if corresponding 
ostream: :operator<< and istream: :operator>> are defined). 

Member functions whose arguments match a limited range of signatures are 
also callable from TCL. 

2.3 Resource Aquisition Is Initialisation (RAII) 

The RAII principle [HI §14.4.1] uses stack resident objects to control the lifetime 
or states of objects elsewhere in the system, such as heap resident objects. One 
of the simplest and most obvious application of RAII is prevention of memory 
leaks that occur through forgetting to destroy objects once they are no longer 
needed. By making the raw pointer a private member of a class, placing the calls 
to new and delete within member functions of the class and arranging for the 
destructor to call a final delete to dispose of the object, we can then use this 
class to declare a stack variable that controls the lifetime of a heap object. Figure 
[1] shows a simple implementation of the ISO C99 variable length automatic 
arrays that is not part of the C++ standard. Like the C version, data is allocated 
when control passes through the statement declaring the array variable, and is 
deallocated automatically when control leaves the scope containing the array 
variable, relieving the programmer of having to remember to delete the object. 
Unlike the C version, however, the data is actually allocated on the heap (via 
the new statement called in the constructor, rather than the stack. This is often 
advantageous as many modern operating systems restrict the stack size to a few 
megabytes which cannot support large arrays. 

RAII is useful for many other tasks, such as ensuring files are closed and 
flushed, network connections are terminated properly, software licenses released 

2 http: / /parallel. hpc.unsw.edu.au/rks/ccolab 
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template <class T> class Array 
{ 

T* data; 
public : 

Array(size_t n) : data(new T[n]) O 

T& operator [] (size_t i) {return data[i];} 

const T& operator [] (size_t i) const {return data[i];} 

"Array () {delete [] data;} 

>: 



Figure 1: Implementation of C99 variable length automatic array feature in 
C++. 

and very importantly, ensuring partly constructed objects are correctly cleaned 
up in the event of an exception occurring[21J 14.4.1]. 

By way of contrast, languages such as Java and C# do not allow the RAII 
technique to be deployed, as complex objects cannot reside on the stack. Instead, 
garbage collection is relied upon to release objects on the heap that are no longer 
needed. Since this occurs at rather indeterminate times (if ever), it cannot be 
relied upon for anything other than controlling memory leaks. 

One mistake seasoned Java or C# programmers make when writing CH — h 
is to assume that the C++ new operator should be used in the same way as it 
is used in the other languages. This leads to code that is hard to debug and 
maintain, and has given CH — h a reputation for being difficult to avoid memory 
problems. 

2.4 The DCAS principle 

The C++ compiler automatically provides a default constructor, a copy con- 
structor and an assignment operator, if none are explicitly provided by the 
programmer, which recursively call the default constructor, copy constructor or 
assignment operator repectively of the base classes and members. The use of 
Classdesc is analogous — Classdesc recursively applies its descriptor on base 
classes and member functions. Since serialisation is the most important Class- 
desc application, I call this the DCAS principle (Default constructor, Copy 
constructor, Assignment and Serialisation). Classes whose members and base 
classes are DCAS are also DCAS automatically, alleviating a lot of programmer 
effort. 

To create a DCAS object from a non-DCAS object requires wrapping. Prim- 
itive types (ints, floats, etc) are DCAS, although their default constructors do 
not initialise them to any particular value, so some care must be taken with 
default constructors of classes taking such types. Pointers, on the other hand 
are not DCAS at all. The default constructor for a pointer does not initialise the 
pointer to a valid value. The copy constructor and assignment operator merely 
copies the pointer, which ends up with two references to the same object. 
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To get around this problem, various solutions have been developed. The 
C++ standard defines auto_ptr<T>[2Tl §14.4.2], which is DCAS (to a degree). 
The default constructor set the reference to NULL, and copy and assignment 
operators pass control of the target object to the target of the copy or assignment 
operation, leaving the original object set to NULL, which breaks some notions of 
"copy" . It supports the notion of "resource aquisition is initialisation" or RAII, 
so that the pointer is released when the auto_ptr object is destroyed. Its main 
use is to provide a means for returning an object by reference from a function, 
avoiding any performance penalties of a copy constructor or the possibility of 
an exception being thrown during the copy constructor. 

The Boost libraryJSj defined the shared_ptr<T> and intrusive_ptr<T> 
concepts, which allow for multiple references to a single object, whilst still sup- 
porting RAII, which are DCA. The shared_ptr<T> concept is so useful that it 
has been included into TR1 pQ , which is scheduled to be standardized as part of 
the next C++ standard. 

However, none of these concepts can be serialised, as these objects are ini- 
tialised to the address of an object created by an earlier new statement. The 
actual type of the object is unknown at serialisation time, only the base class 
T declared in the template argument is known. This arrangement allows the 
handling of polymorphic objects, which we will return to in 

The Classdesc package includes the ref<T> concept, which implements a 
reference counted dynamic reference class similar to Boost's intrusive_ptr<T> 
concept, which is serialisable (hence DCAS). Instead of creating the target ob- 
ject outside the ref class with a new statement, as done in Boost's smart pointer 
concepts, the target object is created on first dereference. This has the advan- 
tage that the reference counter can be stored alongside the object of type T 
on the heap like intrusive_ptr<T> does, without the need for T to support 
any reference counting API, however T is required to be DCAS. Polymorphism 
f £|2.6p . which requires special treatment, is not supported by ref at all, however. 

The use of reference counting (whether the Classdesc ref or the Boost ver- 
sions) allows heap allocated objects to be used as simply as they would with 
classic garbage collection. However copying reference counted references is about 
twice as expensive as simple pointer assignment, so under some circumstances, 
the use of such classes may be a performance issue. By judicious use of standard 
CH — h references, and function inlining, this performance impact can be amelio- 
rated, and if necessary, bare pointers can be used within the innermost scope of 
pointer chasing algorithms as a performance optimisation. 

Reference counted references prove effective in implementing acyclic graph 
structues — deleting the reference to the head node is sufficient to ensure that all 
nodes are deleted. However, cycles of references will cause objects to remain in 
existence, even when no references remain to the graph structure. One possible 
means of dealing with this is to perform a graph walk at graph destruction 
time, deleting links to objects that have already been traversed in the walk, 
thus deleting any cycles. Then deleting the head node reference is sufficient to 
delete the entire structure. This operation is most conveniently handled in the 
destructor of some Graph class. 
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2.5 Pointers 



Pointers create difficulties for Classdesc, since pointers may point to a single 
object, an array of objects, functions, members or even nothing at all. When 
array sizes are known at compile time, Classdesc issues an object descriptor 
that loops over the elements, however arrays allocated dynamically on the heap 
through the use of new cannot be handled, even in principle. 

Because pointers cause problems with the DCAS and RAII paradigms, it 
is worth discussing the uses that pointers are put to in C++, and alternatives 
that are available. Many of the uses have been inherited from C, where pointer 
usage is almost unavoidable in practical codes. Pointers are used in C++ for 
the following purposes: 

Passing by reference. This use is inherited from C, but superseded by C++'s 
reference types. 

Dynamic arrays. C++'s std: : vector<T> container can be used for most dy- 
namic array purposes without any extra overhead. It is DCAS, provided 
the element type T is also DCAS, and satisfies the RAII technique. If a 
standard container is not suitable, then a purpose-built container such as 
shown that in figure [T] can be provided. 

Strings, std: : string provides a safe and DCAS-ready alternative to char* 
variables. 

Graph structures. Classdesc provides the DCAS-ready ref <T> which is suit- 
able for graphs and trees. 

Polymorphic objects. Classdesc provides the poly<T> for handling polymor- 
phic object heirarchies in a DCAS fashion ( tj2.6p . 

Opaque handles. Opaque handles are used to improve compile times by hid- 
ing the actual implementation details, including instance variables, in a 
separate compilation unit. This is not a major problem, but specific meth- 
ods must be provided by the programmer for construction, destruction, 
copying and serialisation of the object referenced by the opaque handle. 
These may call automatically generated versions of the methods in the 
separate compilation unit to reduce programmer burden. 

Libraries. C language API libraries will often use pointers to data structures. 
Where the details of these data structures are provided as part of the 
interface file, it is possible to use the automatically generated (whether 
compiler or Classdesc generated) methods to implement a DCAS wrapper 
around these objects. Where opaque handles are used, however, one's 
choices are limited depending on whether the appropriate methods for 
implementing copying and serialisation have been provided (some means of 
construction and disposing of objects will always be provided), or whether 
source code is available. 
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It turns out that one can distinguish between member pointers and normal 
pointers quite easily through overloading of object descriptors. Member point- 
ers are relevant for exposing an object's methods to a scripting interface, for 
example, and are also serialised and passed between processes in ClassdescMP 
to implement a form of remote procedure calling. Classdesc does not distin- 
guish between pointers to functions and pointers to objects as simple function 
overloading is not sufficient to distinguish them. However, the Boost library pro- 
vides a template metaprogramming technique for distinguishing between func- 
tion pointers and objects pointers in its types_trait package, so providing 
overloading for function pointers is planned for the future. 

By default, an attempt to serialise a pointer will issue a runtime warning. 
However, if pointer members are genuinely necessary, it is possible for the pro- 
grammer to specify that pointers either point to a single object of the specified 
DCAS type or are NULL if invalid. We call this the graphnode protocol. This 
situation is most likely to occur when using a "legacy" library that deals with 
pointers, and wrapping the data with something like ref is prohibitive. The 
gSOAP package;^] is an example. 

Within the Classdesc system it is possible to specify that all pointers of a 
given pointer type satisfies the graphnode protocol, or that all pointers within 
a given graph structure satisfy the graphnode protocol. The pack descriptor 
than walks the graph structure keeping a track of nodes visited so that cycles 
are handled, and recursion cut off to avoid stack limits being breached. 

2.6 Polymorphism 

CH — h has two notions of polymorphism, compilc-time and runtime. Compile- 
time polymorphism (aka generic programming) is implemented in terms of tem- 
plates, and allows the provision of code that can work on many different types 
of objects. On the other hand, runtime polymorphism involves the use of virtual 
member functions. Whereever generic programming can solve a task, it is to be 
preferred over runtime polymorphism, as virtual member functions introduce 
procedure call overhead, and inhibit optimisation. Furthermore, the use of a 
DCAS class like poly introduces additional overheads. 

Nevertheless, there are situations that cannot be solve with compile-time 
polymorphism, for example a container containing objects of varying types. 
For this purpose, Classdesc's poly type is useful. To use poly, your object 
heirarchy must implement the following interface (provided as an abstract base 
class object). 

struct object 
{ 

typedef int TypelD; 
virtual TypelD typeO const=0; 
virtual object* cloneO const=0; 
virtual void pack(pack_t *b) const=0; 
virtual void unpack (pack_t *b)=0; 
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virtual "objectO O 

>; 

The type ( ) method implements a simple runtime type identifier system. In 
the case of obj ect, it uses simple integer tags, which are assumed to be allocated 
more or less consequitively to types in the type heirarchy. However, any type 
may be used provided it is exported as the typedef Type ID, and an appropriate 
customised type table class is defined (see below). One possibility, although by 
no means the most efficient, is to use the object's type_info object returned 
by C++'s inbuilt run time type identification system[5Tl §15.4.4]. 

It is not actually necessary to use this abstract base class to use poly. The 
base class (which must be default constructible, hence not abstract) is passed to 
the poly template. Classdesc provides an empty concrete class Eobject which 
can be used for this purpose. 

To assist in deriving classes from object, the Object template is provided. 

template <class This, int Type, class Base=object> struct Object; 

The first template argument This is the class you're currently defining, the 
second (Type) is the integer value of its type tag and Base is the base class you 
are deriving from. Eobject is defined as 

class Eobject: public Object<Eobject,0> {}; 

and a new class (eg f oo) with type ID 1 can be defined 

class foo: public Object<f oo , 1 ,Eobject> { . . . 

This saves having to explicitly provide versions of the virtual functions type ( ) , 
cloneO, packO and unpackO, as these are provided by Object. It also pro- 
vides a utility method cloneTO which executes clone () , but instead of return- 
ing a bare object pointer, returns a pointer to an object of the same type as 
the calling object (if legally convertible via dynamic_cast). 
The synopsis of poly is: 

template <class T=Eobject, class TT=SimpleTypeTable<T> > 

class poly 

{ 

public : 

TT TypeTable ; 
polyO ; 

poly(const polyreffe x) ; 
poly(const T& x) ; 
polyfe operator=( const polyfe x) ; 
polyfe operator=( const T& x) ; 

template <class U> void addObjectO; 

template <class U, class A> void addObject (A) ; 

template <class U, class Al, class A2> void addObject (Al , A2) ; 
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T* operator-> () ; 

T& operator* () ; 

const T* operator->() const; 

const T& operator* () const; 

template <class U> U& castO; 
template <class U> const U& castO; 
void swap(poly& x) ; 

}; 

Most of this is fairly straightforward. However the addObject () and cast () 
methods need a little more explanation. To make the poly object an object of 
type (say f oobar), use the following calls: 

poly.addObject<foobar>() ; //calls foobarO 
poly.addObject(l) ; //calls foobar(l) 

poly. addObject (1, "hello"); //calls f oobar (1 , "hello") ; 
poly.addObject(foobar(x,y,z)) ; //more than 2 arguments 

The cast method provides a convenient method casting the poly object to 
a specific type. It is equivalent to calling dynamic_cast, but a little easier to 
use, ie 

poly.cast<f oobar>() . grunge () <=> dynamic_cast<f oobar&>(*poly) . grunge () ; 

The return type was chosen to be a reference, not a pointer, as this is the more 
convenient form. It can easily be converted to a pointer with the & operator. 
The TypeTable member of poly must implement the following interface 

class typetable 
{ 

Basefe operator [] (TypelD) ; 

void register_type (const Basefe); 

}; 

where Base is the base type of the poly class, and is basically a database of 
reference objects, from which new objects can be constructed using clone (), 
given a type identifier. This is used for implementing serialisation. Classdesc 
provides simple implementation of this as SimpleTypeTable<Base>, where the 
TypelDs are integers that are reasonably close to each other. 

2.7 Member Privacy 

Serialisation descriptors need access to all members of an object, including pri- 
vate and protected ones. Since in C++ class namespaces are closed by design 
(no new members can be added, except by inheritance), descriptors need to be 
placed in a global or an open namespace. This means that friend declarations 
need to be added to all class definitions with private or protected areas. The 
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convention adopted by Classdesc is to define two macros that expand to a list 
of friend declarations for the descriptors, similar to the following: 

#define CLASSDESC_ACCESS(type)\ 

friend void pack(pack_t *,eco_string,type&) ;\ 

friend void unpack (unpack_t *,eco_string,type&) ; 

#def ine CLASSDESC_ACCESS_TEMPLATE (type ) \ 
friend void packo(pack_t *,eco_string,type&) ;\ 
friend void unpacko (unpack_t *,eco_string,type&) ; 

Then placing a CLASSDESC_ACCESS statement in the class definition allows 
the descriptor access to the private members of the class: 

class foo 
{ 

int bar ; 

CLASSDESC_ACCESS(foo) ; 
public : 

float bar2; 

>; 

An auxiliary program insert-friend is provided as part of the Classdesc 
package to automatically insert these macros into class definitions. 

For object exposure, only public members need to be processed by the de- 
scriptor. Classdesc provides a -respect_private flag to indicate that private 
and protected members should be ignored by the descriptor. 

3 ClassdescMP: easy MPI programming in C-\ — |- 
3.1 MPIbuf 

MPI [17] is an industry standard API for constructing distributed memory par- 
allel applications using the message passing metaphor. Originally designed for 
use with Fortran77 and C, it primarily deals with passing arrays of simple types 
such as characters, integers or floating point numbers. In a later incarnation, 
C++ bindings to the library were provided as part of the MPI-2 standard. It 
primarily added support for the MPI namespace, communicators as objects and 
support for C++ exception handling. However, messages are fundamentally 
composed of arrays of simple types. 

Classdesc's general serialisation operation solves the problem of passing mes- 
sages of complex objects as the pack descriptor turns a sequence of complex 
objects into an array of bytes. In ClassdescMP, the MPIbuf type is derived from 
pack_t, so messages can be constructed in a streaming fashion, eg: 

buf « a « b « send(l); 



12 



which sends a and b to process 1. 

Streaming MPI messages is not new — it is used in PAR A ++[6], and in 
OOMPI 3 , for example. However, in these packages, programmers are required 
to provided explicit serialisation routines for complex types. 

To receive a message, use the MPIbuf : :get(): 

buf .get() » a » b; 

Optional arguments to get allow selective reception of messages by source and 
tag. 

By setting the preprocessor macro HETERO, MPIbuf is derived from xdr_pack 
instead of pack_t. This allows ClassdescMP programs to be run on heteroge- 
neous clusters, where numerical representation may differ from processor to 
processor. 

In MPI-2 CH — h bindings, the basic object handling messages is a communi- 
cator. In ClassdescMP, an MPIbuf has a communicator. It also has a buffer, 
and assorted other housekeeping members. Some of these are used for managing 
asynchronous communication patterns: 

{ 

MPIbuf buf; 

buf « a << isend(l) ; 

while (sometliing_to_do && !buf .sentO) do_something; 

buf . wait () ; 

buf « b « isend(2) ; 

> 

When buf goes out of scope, an implicit MPI_Wait is called to ensure that the 
message has been correctly sent. 

Often, one needs to perform all-to-all exchange of data. To do this, we use 
an MPIbuf _array: 

{ 

tag++; 

MPIbuf_array sendbuf (nprocs () ) ; 
for (unsigned proc=0; proc<nprocs () ; proc++) 
{ 

if (proc==myid() ) continue; 

sendbuf [proc] << requests [proc] << isend(proc,tag) ; 

} 

for (int i=0; i<nprocs()-l ; i++) 
{ 

MPIbuf b; 

b.get(MPI_ANY_SOURCE,tag) ; 
b >> rec_req[b.proc] ; 

> 

> 
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This piece of code is copied verbatim from the Graphcode library (Q. Note the 
use of a tag variable to ensure that unrelated groups of communication do not 
get mixed up. Also, when sendbuf goes out of scope, an implicit MPI_Waitall 
called, which ensures that all messages in the group have been sent. 

3.2 MPISPMD 

Whilst MPIbuf and MPIbuf _array are the heart of ClassdescMP, there is also 
some application framework support. Two programming models are supported: 
an SPMD mode, which simply wraps up the MPI setup and teardown into 
an object, and a master-slave mode in which the master thread controls slave 
thread objects via remote method invocation. 

The SPMD mode is rather similar to that of PARA++0. By instantiating 
an object of type MPISPMD, the MPI environment is initialised. One key fea- 
ture of Classdesc's implementation is that MPI_Finalize () is called from the 
MPISPMD object's destructor — not only does this save the programmer from 
having to remember to do this, but it is also called during stack unwinding if 
an exception is thrown. This alleviates the problem with some MPI implemen- 
tations (eg MPICH) which leave active threads running and consuming CPU 
time if MPI_Finalize () is not called. 

3.3 MPIslave 

The master-slave mode is a more powerful feature of ClassdescMP. Setting up 
the structure of a master-slave program is very tedious and error prone. The 
MPIslave class is designed to make master-slave algorithms simple to program. 

When a MPIslave object is instantiated, a slave "interpreter" object is in- 
stantiated on each process to receive messages from the master. As MPIslave 
needs to know the type of object to be instantiated on the slave processes, it is 
implemented as a template, with the type of slave object passed as the template 
parameter. 

A message sent to the slave process starts with a method pointer of type: 
void (S: :*) (MPIbuf &) where S is the slave object type, followed by the argu- 
ments to be passed. That method of the slave object is then called, with the 
arguments passed through MPIbuf argument, and any return values also passed 
through MPIbuf argument: 

struct S 
{ 

void f oo (MPIbuf & args) 
{ 

int x,y,r; 
args >> x >> y; 

args. reset () << r; 

} 
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>; 



main(int argc , char** argv) 
{ 

MPIslave<S> C; 
MPIbuf buf; 
int x=l, y=2; 

buf « &S::foo « x « y « send(l); 

> 

When the MPIslave object is destroyed on the master process, it arranges 
for all the slave objects to be MPI_Finalized() and destroyed also. 
MPIslave also has features for managing a pool of idle slaves: 

MPIslave<S> C (argc, argv) ; 
vector<job> joblist; 

for (int p=l; p<C.nprocs && p<joblist . sizeO ; p++) 

C.exec(C « &S::do_job « joblist [p] ) ; 
while (p<joblist . size () ) 
{ 

process_return(C . get_returnv() ) ; 
C.exec(C « &S::do_job « joblist [p++] ) ; 

} 

while (!C.all_idle()) 

process_return(C . get_returnv() ) ; 

3.4 Access to underlying MPI functions 

The philosophy of ClassdescMP is not to hide the underlying MPI transport 
layer. It is possible to mix MPI calls with ClassdescMP calls, which may be 
done to provide a more efficient implementation of a particular operation, or 
to provide functionality not provided in ClassdescMP (reductions for example) . 
This allows ClassdescMP to concentrate on providing new functionality, rather 
than simply wrapping existing MPI functionality in a new syntax. 

In terms of performance, the only overhead ClassdescMP adds is copying 
data into the MPIbuf variable. In the case of sending a large array of a simple 
type, it may well be more efficient to call the appropriate MPI call directly. On 
the other hand, if one is sending a lot of different small variables, it is more 
efficient to marshal the data into a single array, before sending it as a single 
message, for which task ClassdescMP is extremely effective. 

This philosophy of coexisting with the underlying MPI library is in sharp 
contrast with PARA++|6 j , which was designed to allow the transport layer to 
be swapped completely for another one (eg PVM). However, MPI is now so 
ubiquitous that swapping the transport layer no longer seems to have much of 
an advantage. 

Currently, ClassdescMP is implemented completely in terms of MPI-1 func- 
tionality. As MPI-2 implementations become available, increased performance, 
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and or functionality dependent on MPI-2 functionality may be added. The most 
obvious MPI-2 feature to impact ClassdescMP is one-sided messaging, which 
would allow the implementation of the global pointer concept[7]- Unfortunately, 
one-sided messaging appears to be the one area of MPI-2 left out of existing 
implementations, or implemented badly. One could implement one-sided mes- 
saging using a standard threads API, such as Posix threads, however in real 
applications encountered to date, separating the communication and compu- 
tation steps (see ij4.6p has proved effective, so we haven't needed to explore 
one-sided communication. 

4 Graphcode 

Whilst MPISPMD and MPIslave provide rather simple application frameworks 
for message passing codes, Graphcode provides a far richer framework within 
which programming is closer to data parallel programming than the lower level 
message passing environment on which it is based. The underlying paradigm 
of Graphcode is objects distributed on a graph. Computation takes place within 
the objects (vertices of the graph), and communication takes place along the 
edges of the graph. 

Graphcode calls the PARMETS parallel graph partitionerpjjll!] to partition 
the graph across the available processors, given a suitable weighting of computa- 
tional and communication costs (which defaults to a uniform weighting). Since 
the solution found by PARMETS is a Pareto non-dominated solution (no other 
partitioning exists that has better load balancing and less communication), the 
costs do not need to be provided in any normalised fashion — only the leading 
order of computational or communication complexity need be provided. 

Since traditional data parallel programs can be expressed as a graph (put 
aligned data elements on the same node, express communication patterns as 
graph links, eg shifts as nearest neighbour connections), it could be argued 
that Graphcode embraces and extends the data parallel programming model. 
However the data layout within a compute node differs. For instance, if one 
considers a 5-point stencil of some hypothetical 2 component field: 

U 'i,j = V i,j ~ ^( v i~l,j + V i+l,3 + v i,j-l + v 'i,] + l) (!) 

!/ 

V i,3 = Ul >3 ~ 7( U i-lJ + U i+hj + U i,j-1 + u i,j + l) 

then Graphcode will store Uij next to Vi t j, whereas an HPF implementation 
will store Ujj next to Uj+xj. It remains to be seen what impact this has on 
performance in typical situations. 
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4.1 Graphcode objects 

A Graphcode graph is represented by the Graph class. Nodes of the graph 
are polymorphic objects, derived from the object abstract base class. Being 
polymorphic allows more complex topologies, such as hypergraphs, where nodes 
may belong to more than groupings. For example consider an object class 
representing a human being, and also another object representing the families 
that human being might belong to, for instance the family e was born into, and 
the family c married into: 

class human: public object 
{ 

}; 

class family: public object 
{ 

}; 

The relationship belongs to is represented by a link connecting a human object 
with a family object. The reverse link represents the relationship contains. 

Graphcode objects may be located on any processor, and may need to 
be migrated to achieve dynamic load balancing. Objects are accessed through 
proxy variables, of type objref . An objref contains the object's identifier, its 
location (processor ID), and may be dereferenced to obtain access to the object 
(if the object is located in the current process's address space), or a copy of the 
object (if it exists in the current process's address space): 

class objref 
{ 

public : 

GraphID_t ID; 

unsigned int proc; 

objectfe operator*(); 

object* operator-> () ; 

const object* operator->() const; 

bool nullrefO const; 

inline void nullifyO ; 

void addref (object* o, bool mf lag=f alse) ; 

>; 

The members nullrefO allow one to test whether the objref points to 
a copy of the object in the current address space, and nullifyO allows one 
to remove the copy of the object, addref (feobj ,mf lag) points the objref at 
object obj, setting mflag (managed flag) to true allows the objref destructor 
to destroy the object. 
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Several virtual members need to be provided for any object, including vir- 
tual serialisation members pack and unpack as described in i )2.6[ a "virtual 
constructor" , a "virtual copy constructor" and a virtual type identifier. "Vir- 
tual constructors" are described in [51] §15.6.2], and the exact procedure used 
in Graphcode is detailed below. 

To migrate objects between processors, Graphcode will arrange the following 
sequence of operations: 

Code on source processor 

objref a; 
MPIbuf b; 

b « a. ID « a->type() « *a « send(dest) ; 

Code on destination processor 

MPIbuf b; 
GraphID_t ID; 
int type ; 

b.getO » ID » type; 
objreffe a=objects [ID] ; 
if (a.millref ()) 

a. addref (archetype [type] ->lnew() ,true) ; 
b » *a; 

Note the use of the "virtual constructor" InewO . We use the type informa- 
tion to index into a database of object archetypes, and call InewO to obtain a 
new object of that type. A programmer defining an object class f oo defines the 
virtual members as follows: 

class foo: public object 
{ 

public : 

virtual int typeO {return vtype (*this) ; } 
virtual object* InewO {return vnew(this) ; } 
virtual object* IcopyO {return vcopy (this) ; } 
virtual void lpack(pack_t *b) ; 
virtual void lunpack(pack_t *) ; 

} 

The template function vnewO returns a pointer to a new object of the same 
type as its argument, and vcopy () returns a copy of the object pointed to by 
its argument. Both of these functions use the C++ new operator, so can be 
disposed of using delete at a later stage. 

Graphcode implements its own runtime type identification — the standard 
C++ RTTI typeidO call returns a complex object of type type_inf o. Not only 
is it inefficient to transfer the whole type_inf o object via MPI, and inefficient 
to use a complex object to index into the archetype database, we also have the 
potential scenario of the object codes on different processors being generated by 
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different compilers (a heterogeneous computer), and hence having incompatible 
type_inf o objects. 

Graphcode's RTTI system is very simple. An object's virtual type member 
makes a call to the template function vtype, which places a version of itself into 
the archetype database: 

template <class T> int vtype (const T& x) 
{ 

static int t=-l; 
if (t==-l) 
{ 

t=archetype . size () ; 

objref *o=new objref; o->addref (x . InewO ) ; 
archetype . push_back(o) ; 

} 

return t ; 

} 

Having discussed the virtual function interface of object, we are now ready 
to present to full definition of object: 

class object: public Ptrlist 
{ 

public : 

/* serialisation methods */ 

virtual void lpack(pack_t *buf)=0; 

virtual void lunpack(pack_t *buf)=0; 

/* virtual "constructors" */ 

virtual object* InewO const=0; 

virtual object* IcopyO const=0; 

virtual int typeO const=0; 

virtual idxtype weight () const {return 1;} 

virtual idxtype edgeweight (const objref & x) const {return 1;} 

}; 

As well as the virtual members we have described, there are two weight functions 
used by the ParMETIS partitioner, used to described the computational cost 
(weight () ) represented by the object, and the communication cost (edgeweight (x) ) 
in transferring a copy of a remote object x into local address space. As can be 
seen, these default to 1 , but may be overridden by the programmer of the de- 
rived object. Finally, object is derived from Ptrlist, which is syntactically 
equivalent to a vector of objref 's, and represents the objects linked to this 
object. 

As can be seen, there is a lot of similarity between the Graphcode object 
type and the object type used with the poly<T> class. Graphcode was the first 
real application of Classdesc to a polymorphic data structure, and so its design 
stongly influenced that of the poly<T>, which was developed later. At a later 
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stage, we hope to migrate the Graphcode API to use the ref <T> and poly<T> 
interfaces to more closely couple Graphcode with Classdesc. 

4.2 omap 

Each object is identified by a unique identifier, so each process maintains a map 
object Graph: : objects that can be used to locate the objref corresponding to 
a particular identifier. Graphcode supplies two possible map objects — a vmap, 
using a std: : vector which is optimised for contiguous, or nearly contiguous 
ranges of object identifiers, and hmap, a hash map implementation suitable for 
non-contiguous identifiers. You select the version of omap you wish to use by 
using the namespace graphcode_vmap or graphcode_hmap as appropriate. 

It might seem puzzling why the Graph type is not a template, with the 
omap type as a template argument. The problem is that internally, objects 
need to keep track of the map to which they belong in order to regenerate the 
neighbourhood linklist after migration. Therfore, the map type will need to be 
a template argument to the objref, but the map itself takes objref as a template 
argument, unfortunately leading to a circular template definition of objref: 

template <class map> class objref; 

template <class map> 
class omap: public map 
{ 

}; 

template <class map> 

class objref 

{ 

omap<map> Map; 

>; 



typedef std : :map<int , objref <Map> > Map; 
omap<Map> foo; 

In practice, compilers cannot cope with this code. 
4.3 Standard library syntax 

Wherever possible, the syntax of Graphcode's containers follows that of the 
standard library, so should be familiar to C++ programmers. So Ptrlist and 
omap have iterators, and an operator []. One slight departure from the stan- 
dard library, is that omap: : iterator: : operator* () returns an objref, not 
pair<GraphID_t , objref >, as one might expect if one followed the std: :map 
model. The reason for this is that objref objects already contain the object's 
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identifier, and the pair construct is redundant and wasteful. It also leads to 
clearer code. 

4.4 Graph 

Having introduced objects, objrefs and omaps, we are now in a position to 
present a skeleton of Graphcode's Graph class. 

class Graph: public Ptrlist 
{ 

public : 

omap objects; 

void rebuild_local_list () ; 
void clear_non_local() ; 

template <class T> objreffe AddObject (const T& type, GraphID_t id); 

void gather () ; 

void Prepare_Neighbours() ; 

void Partition_Objects() ; 

inline void Distribute_Objects() ; 

>; 

Graph contains two main data members — the objects database mentioned 
previously, and a list of object references that refers those objects hosted in the 
current address space. This list is a base class of Graph, allowing a simple loop 
of the form: 

for (Ptrlist :: iterator i=begin() ; i!=end(); i++) 

to be, in effect, a data parallel operation. 

The member rebuild_local_list () refreshes this list after a migration of 
objects, and the member clear_non_local() nullifies those objrefs that are 
not hosted locally, reclaiming memory. 

Creating a graph involves calls to AddObject to add an object of type T 
(which must be derived from object), and adding the links to each object to 
form the graph. For example, the code for a 2D 5-point stencil might look like: 

for (i=0; i<nx; i++) 
for (j=0; j<ny; j++) 

AddObject(f oo() ,mapid(i, j) ) ; 
for (i=0; i<nx; i++) 
for (j=0; j<ny; j++) 
{ 

objreffe o=objects [mapid(i , j )] ; 
o->push_back (objects [mapid(i-l , j)] ) ; 
o->push_back (objects [mapid(i+l , j)] ) ; 
o->push_back (objects [mapid(i , j-1)] ) ; 
o->push_back (objects [mapid(i, j+1)] ) ; 

} 
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where the user supplied function mapid( , ) converts a coordinate into a pin iden- 
tifier. Boundary conditions can be handled by returning the special identifier 
bad_ID when no link is applicable. The Graph: : AddObject and Ptrlist : :push_back() 
members refuse to add on object having a bad_ID identifier. 

4.5 Distribution of data over multiple processors 

To distribute objects from the master thread to slave threads, according to some 
specified distribution, assign the desired destination of the objects to the proc 
member, then call Graph: :Distribute_Objects () , which broadcasts the entire 
graph to all nodes. There is an inverse Graph: : gather () function that gathers 
data from all the nodes into the master thread copy. 

To partition the objects using PARMElTS, you must first distribute the graph 
according to some distribution (no matter how naive and non-optimal), and 
then call Graph: :Partition_Objects() to redistribute the Graph more op- 
timally by calling the PARMETS library. Partition_Objects() can be then 
called periodically to rebalance the load, if the graph contains mobile agents for 
instance. 

Whilst is conceptually the easiest to construct the entire computation on the 
master process, and distrbute the data using Graph: :Distribute_Objects () , 
it is possible to for each process to construct just its part of the computation, 
and for Graph: :Partition_Objects() to rebalance the load without all the 
data needing to pass through a single process's address space. 

4.6 Communication and computation steps 

In typical Graphcode applications deployed to date, an update involves perform- 
ing a computation on each object using the values of the neighbouring objects, 
storing the results into a backing buffer graph, and then swapping the back- 
ing buffer with the original graph, typical of a synchronous updating scheme. 
Asynchronous schemes could be employed as well with due care. The only 
communication required is to ensure a copy of all neighbours residing on re- 
mote processes is transferred to the processor hosting the object being updated. 
Whilst this could be done as needed via one-sided messages, it is more efficient 
to batch up all the objects that need to be transferred so that only one mes- 
sage is sent between each pair of processes. Since the communication pattern 
is already described by the graph's links, all a programmer needs to do is make 
a call to Graph: : Prepare_Neighbours () before starting the computation step. 
Returning to our 5 point stencil example (Eq (JTJ) ) , the update code would be 
written as: 

graph->Prepare_Neighbours () ; /* communication step */ 
for (Ptrlist :: iterator p=graph->begin() ; p ! =graph->end() ; p++) 
{ 

f oo* b=f ooptr (back->objects [p->ID] ) ; 
b->u = f ooptr (p)->v; 
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b->v = f ooptr(p)->u; 

for (Ptrlist :: iterator n=p->begin() ; n!=p->end(); n++) 
{ 

b->u -= 0.25 * f ooptr(n)->v; 
b->v -= 0.25 * f ooptr(n)->u; 

} 

> 

swap (graph, back) ; 

Here graph and back are the graph and backing buffer for the calculation. We 
are also assuming a utility function fooptrO written by the programmer to 
return a foo* pointer to the object. A typical implementation of this might be: 

foo *f ooptr (objref & x) {return dynamic_cast<f oo*>(&*x) ;} 

foo *f ooptr (Ptrlist :: iterator x) {return dynamic_cast<f oo*>(&**x) ;} 

It is important to use the new dynamic_cast feature of C++ to catch errors such 
x not referring to a foo object, or an incorrect combination of dereferencing and 
address-of operators. dynamic_cast will return a NULL pointer in case of error, 
which typically causes an immediate NULL dereference error. Old fashioned C 
style casts (of the type (foo*)) will simply return an invalid pointer in case of 
error, which can be very hard to debug. 

It should be noted that a Graph object appears as a list of those objects local 
to the executing processor. So this code will execute correctly in parallel. Each 
time Prepare_Neighbours is called, the message tag is incremented, preventing 
subsequent calls from interfering with the delivery of the previous batches of 
messages. 

4.7 Deployed applications and performance 

Within the ^ c tab system[T5], Graphcode is deployed with two of the example 
models provided with the ^ c tab software. These models are working scientific 
models, not toy examples. The first model is the spatial Ecolab modelpjj], 
where the panmictic Ecolab model (Lotka-Volterra ecology equations, coupled 
with mutation) is replicated over a 2D Cartesian grid, and migration is allowed 
between neighbouring grid cells. The performance of this model has not been 
studied much yet. 

The second model is one of jellyfish in assorted lakes on the islands of Palau. 
Each jellyfish is represented as a separate CH — h object, commonly called agent 
based modelling. The jellyfish move around within a continuous space represent- 
ing the lake, and from time to time bump into each other. In order to determine 
if a collision happens in the next timestep, each jellyfish must examine all the 
other jellyfish to see if its path intersects that of the other. This is clearly an 
0(N 2 ) serial operation, which severely limits scalability of the model. 

To improve scalability, the lake is subdivided into a Cartesian grid, and the 
jellyfish is allocated to the cell describing its position. If the cells are sufficiently 
large that the jellyfish will only ever pass from one cell to its neighbouring cell 
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Figure 2: Speedup of the Jellyfish application with 1 million jellyfish in the OTM 
lake. Speedup is reported as accumulated wall time for a single processor divided 
by the accumulated wall time for n processors for that point in the simulation. 
Note that the speedup from 16 processors to 32 is more than double. 



in a given timestep, then only the jellyfish within the cell, plus those within 
the nearest neighbours need to be examined. This reduces the complexity of 
the algorithm to dramatically less than 0(N 2 ), and also allows the algorithm 
to be executed in parallel. In the field of molecular dynamics simulations, this 
method is often called a particle in cell method. PARMETS allows nodes and 
edges to be weighted, so in this case we weight each cell by Wi = nf , and each 
edge by Vij = rij . In figure [2 the speedup (relative performance of the code 
running on n processors versus 1 processor) is plotted for different stages of the 
simulation. The simulation starts at 7am with the jellyfish uniformly distributed 
throughout the lake. As the sun rises in the east, the jellyfish track the sun, and 
become concentrated along the shadow lines. PARMETS is called repeatedly to 
rebalance the calculation. As the sun sets at around 5pm, the jellyfish disperse 
randomly throughout the lake. In figure 02 the speedup is plotted as function 
of simulation time, so the effect of load unbalancing can be seen. It can be seen 
that Graphcode delivers scalable performance in this application. 

Graphcode has also been deployed in a 3D artificial chemistry model[8] ex- 
hibiting superlinear speedup over 64 processors due to the effectively enlarged 
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Figure 3: Speedup of the Jellyfish application as a function of simulation time. 
This is differential speedup, calculated from the wall time needed to simulate a 
6 minute period, so does not include partitioning time. 
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memory cache. 



5 Current Status 

Classdesc and Graphcode are open source packages written in ISO standard 
CH — h They have been tested on a range of platforms, and compilers, including 
Linux, Mac OSX, Cygwin (Windows), Irix, Tru64; gcc and Intel's ice for Linux, 
as well as native C++ compilers for Irix and Tru64. 

The source code is distributed through a SourceForge project, available from 
http://ecolab.sourceforge. net| The code is managed by the Aegis source code 
management system, which is browsable through a web interface. Version num- 
bers of the form x.Dy are considered "production ready" — they have been 
tested on a range of platforms, and are more likely to be reliable. These codes 
are also available through the SourceForge file release system. The versions 
x.y.Dz are under active development, and have only undergone minimal testing 
(ie they should compile, but may still have significant bugs). Developers inter- 
ested in contributing to the code base can register as a developer of the system 
by emailing one of the authors. 

Classdesc and Graphcode are also included as part of the ^ c tab simulation 
system, which is available from the same source code repository. 
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